Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a line continuation character '⤸' #29273

Closed
wants to merge 1 commit into from
Closed

Conversation

c42f
Copy link
Member

@c42f c42f commented Sep 19, 2018

Closes #27533 by adding a line continuation character which is only valid in space sensitive parsing contexts (see #27533 (comment))

I've often wanted a nice way to continue space separated argument lists for macros across multiple lines. Resorting to function call syntax isn't very satisfying because it's quite syntactically different from the usual way that macros look in julia code.

Demo:

# Valid - space sensitive context
@info "A message which could be rather long" ⤸
      a="something more"                     ⤸
      b="another thing"

# Invalid - there should only be one way to do this
x = some_variable ⤸
    + other_variable
# Valid - the current, perfectly good convention for writing this
x = some_variable +
    other_variable

@@ -503,6 +503,18 @@
(skip-multiline-comment port 1))
(skip-to-eol port)))

(define (maybe-continue-line port)
(if (not space-sensitive)
(error "Line continuation '⤸' is only allowed for space separated lists like macros arugments and matrix literals"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'm having second thoughts about using space-sensitive in the lexer right here. It seemed like a good idea when I got up this morning but I'm not so sure now.

Needs further testing.

@JeffBezanson
Copy link
Member

I wonder if there is a unicode character more closely designed for something like this, e.g. U+2B90 ("return left") or U+23CE ("return symbol")?

@c42f
Copy link
Member Author

c42f commented Sep 20, 2018

Yes, it would be much better to have a unicode symbol which is semantically correct rather than just visually pleasing. The fact that U+2938 is classified as a math symbol makes it a questionable choice in that sense.

If there's a deeper objection to having this feature at all, please let me know. There's several independent dimensions along which this might be a bad or good idea.

@c42f
Copy link
Member Author

c42f commented Sep 20, 2018

I see those 👎's, but which dimension do they refer to? There's many:

  1. The implementation sucks (not a big problem, can be fixed)
  2. The choice of unicode character is poor (trivially fixed - suggestions welcome)
  3. There's already a way to do this with the multiline comment syntax. (IMO ugly/hacky, but that's subjective)
  4. Having this be restricted to whitespace sensitive context is confusing (I'd argue this is simply making the existing whitespace sensitivity more ergonomic)
  5. Insert your reason here

@strickek
Copy link
Contributor

Would this also help to structure code of long Regex strings?

@c42f
Copy link
Member Author

c42f commented Sep 20, 2018

@strickek no, we use the PCRE library for regex syntax. Luckily PCRE already supports free spacing mode so you can just use that:

julia> match(r"(
                a+ |
                b
               )"x, "aa")
RegexMatch("aa", 1="aa")

(note the x suffix on the regex string)

@c42f
Copy link
Member Author

c42f commented Sep 20, 2018

A quick survey on google images suggests that, to the extent that text editors provide any reasonable visual feedback about line wrapping, a little curly arrow of the general shape of U+2938 is probably the most common. Bad font support for some arrows like U+2B90 could be a problem.

Handy list of many arrows: https://en.wikipedia.org/wiki/Arrow_(symbol)#UnicodeBlocks

Some more:

  • U+21A9 ↩ LEFTWARDS ARROW WITH HOOK
  • U+21B5 ↵ DOWNWARDS ARROW WITH CORNER LEFTWARDS
    ...

@JeffBezanson
Copy link
Member

I think many people are against basic syntax features that require unicode. We could come up with an ASCII alternative, but then the problem is that this doesn't seem valuable enough to be worth stealing any ASCII characters.

I agree that some good use cases for this exist, but overall it leads to a strange kind of code formatting where if you fail to spot one character at the end of a line you misinterpret the whole structure. For instance

[0 1 0  ⤸
 1 0 1]

looks for all the world like a 2x3 matrix but is actually 1x6. If your font lacks that character then you're really in trouble --- very little chance of getting the meaning right.

@StefanKarpinski
Copy link
Member

Note that this does not actually encode a "return" but rather its opposite: "no return" or "continuation of current line". As far as I can tell, there is no Unicode character to represent either of these concepts, so an arrow that looks like it means the right thing (which imo does) seems about as good as possible.

@c42f
Copy link
Member Author

c42f commented Sep 20, 2018

Thanks Jeff, I agree those are the main tradeoffs here.

@StefanKarpinski Yes I was thinking that a return-like glyph is kind of backward in some sense, because the real function of this is to undo the newline. Which leads to considering such odd options as U+21B0 and U+21AB

@info "blah" ↰
      a b c

@info "blah" 
      a b c

@JeffBezanson
Copy link
Member

Or maybe U+2026, .

For now I'm just playing along here because unicode is fun; if we can find a really good character for this it might sweeten the deal enough.

@ararslan
Copy link
Member

which dimension do they refer to

3 for me. This does not seem important enough to warrant the addition of some confusing, Unicode-only, easy to misuse "feature."

@c42f
Copy link
Member Author

c42f commented Sep 21, 2018

Thanks @ararslan (having just a single line of comment is so much more useful and fun than a bald 👎).

I assume by misuse, you refer to splitting matrices across lines in unnatural ways. I'd point out that the julia parser already allows abominations like the following:

julia> [1+
        2 3#=
        =# 4]
1×3 Array{Int64,2}:
 3  3  4

Of course, pointing out existing nasties is never a good argument to encourage more of the same. But it's a fact that we already rely on the user's good taste in exchange for having a concise way to write array literals.

@RuiRojo
Copy link

RuiRojo commented Sep 21, 2018

(Sleepy newbie here)

What about characters at the beginning of the line that represent a grouping? so that it's always visible and can't be missed if the line becomes long, and hopepully their positions as first characters (+ space?) make it not conflict with other usages?

let
⎡  a = [2 3 4 5 6 7 8 910 11 12 13 14 2526 456 23 78 6 ]

   a[2] = 5354@info "blah" 
⎣         a b c
⎡  @onfi "bleh" 
⎣         c b a

end

@StefanKarpinski
Copy link
Member

What about characters at the beginning of the line that represent like a cell grouping?

Excellent idea. However, we have that already: parentheses.

@RuiRojo
Copy link

RuiRojo commented Sep 21, 2018

Ah, ok. Maybe if they worked more robustly it would solve some people's concerns?

julia> ( @macroexpand @x 
       34 345 235 2en en shte 
       noashent oasehtn 
       oasehto aesht
       )
ERROR: syntax: missing comma or ) in argument list

@StefanKarpinski
Copy link
Member

Ah, yes, that's just a parser bug.

@RuiRojo
Copy link

RuiRojo commented Sep 21, 2018

There's this too

julia> ([1 2 3
       4 5 6]
       )
2×3 Array{Int64,2}:
 1  2  3
 4  5  6

@c42f
Copy link
Member Author

c42f commented Sep 21, 2018

Or maybe U+2026, .

I think this is one of the most interesting so far. Unlike arrows, it

  • Has precedent, both in matlab and in typography.
  • Can't be confused with existing editor GUI conventions for autowrapped lines

Typographic convention also suggests the emdash as the correct mark for interrupted dialog which is an interesting analogy. Maybe not an exciting choice, but interesting as the ascii equivalent --- might be free and readable enough:

@info "blah" ---
      a b c

@info "blah" —
      a b c

@c42f
Copy link
Member Author

c42f commented Sep 21, 2018

What about characters at the beginning of the line that represent a grouping?

It's an interesting idea but to solve the problem at hand the continuation character needs to be embedded next to the newline. It's possible (if not desirable) to have one continued construct ending on a line, and another beginning on the same line. Left hand side delimiters would also wreck havoc with the normal flow of editing text - you should be able to edit on the right hand side without messing up the indentation of the rest of the line.

@JeffBezanson
Copy link
Member

On the parentheses idea, parens are in no way intended to negate the meaning of line breaks. For example you might have

([1 2
  3 4], x)

which is a tuple of a 2x2 matrix and x.

We could maybe allow this for the macro case that Stefan flagged as a bug:

(@x a b
 c d)  # further macro arguments

However, I believe there are no cases (except doc strings) where space-delimited macro arguments can continue onto a new line. Since this gives an error now it seems feasible, but is still kind of a weird special case. I'm also not sure people really want to use this "lisp syntax" style.

@stevengj
Copy link
Member

stevengj commented Sep 21, 2018

is a bad choice, in my opinion:

  • It closely resembles the ASCII, ..., which is used for splatting
  • Both it and .. are parsed as binary operators, so it would be breaking as well as confusing to repurpose , and they seem to be quite useful in this form (e.g. a..b is used in ApproxFun to represent intervals).

An em dash is not breaking but is confusing because it is nearly identical to a hyphen - in many fixed-width fonts.

at least is not breaking because it is not a valid identifier or operator in Julia 1.0. But I agree with @ararslan — there is already a way to do this via #= =#, and the need is not great enough to justify adding a more compact syntax.

@arthurp
Copy link

arthurp commented Sep 21, 2018

I have a specific use case to throw in here: (Picking "---" arbitrarily, I'm not taking a stance in this comment)

@function_properties associative commutative ---
function f(x, y)
    ...
end

(or similar applied to structs or even loops)

Here we have a macro which applies some descriptors to a function. There may be multiple descriptors, so they can take up a good bit of horizontal space. The most obvious place to break the line (immediately before function) is invalid due to macro parsing rules. I think other solutions that have proposed don't really address this very well either:

  • Parens: There would need to be a paren after the end in function. This seems really easy to miss and hard to understand since functions are relatively long statements.
  • #= =#: Requires a symbol before function which will reduce the ability to recognize function declarations in code.

So I like the idea of an end-of-line marker that effectively strips the newline at parse time.

@ararslan
Copy link
Member

@arthurp You can put it in a block, then peel out the :block expression inside the macro if it requires a :function head.

@function_properties associative commutative begin
    function f(x, y)
        ...
    end
end

Or simply use the less ambiguous

@function_properties(associative, commutative,
    function f(x, y)
        ...
    end
)

@c42f
Copy link
Member Author

c42f commented Sep 22, 2018

@arthurp great example.

@stevengj It looks like a bug, but appears not to be parsed as an infix operator right now:

julia> :(1 … 2)
ERROR: syntax: missing comma or ) in argument list

julia> :(…(1,2))
:(1 … 2)

@stevengj
Copy link
Member

stevengj commented Sep 22, 2018

@c42f, that appears to be a bug (in #26262?). Base.operator_precedence(:…) returns 10, the same as Base.operator_precedence(:..), and similar for and other "dotty" symbols.

Oh, @JeffBezanson already fixed this in #29314

@ronisbr
Copy link
Member

ronisbr commented Sep 25, 2018

Hi guys,

What I really do not like the #= =# approach is that this is a comment, and usually comments can be removed without altering the code behavior. People can, in the future, unadvisedly remove it leading to a massive change.

@JeffBezanson JeffBezanson added the parser Language parsing and surface syntax label Oct 1, 2018
@TAGC
Copy link

TAGC commented Oct 9, 2018

I'm sad because I'm learning Julia and I thought macros that take multi-line expressions as arguments would be able to accept them on newlines, like annotations in other languages. I was trying to see if something like this would be possible:

@output foo::T
@output bar::T
function baz(a::T, b::T) where {T}
    (foo=a, bar=b)
end

# Equivalent to:
function baz(a::T, b::T)::NamedTuple{(:foo,:bar),Tuple{T,T}} where {T}
    (foo=a, bar=b)
end

@stevengj
Copy link
Member

stevengj commented Oct 9, 2018

@TAGC, you can always do @foo begin ... end to apply a macro to multiple lines and multiple expressions. (Julia macros are vastly more powerful than Python function annotations, so they shouldn't make you sad!)

@TAGC
Copy link

TAGC commented Oct 9, 2018

@stevengj The code I posted is what I consider the ideal way of expressing what I want (for convenience I'll reproduce it below as well).

@output foo::T
@output bar::T
function baz(a::T, b::T) where {T}
    (foo=a, bar=b)
end

Can you show me what's the closest valid Julia I can get to that snippet using begin...end syntax?

@JeffBezanson
Copy link
Member

I think the problem there is just needing better syntax for named tuple types. For example it could be

function baz(a::T, b::T)::@NT(foo:T, bar::T) where {T}
    (foo=a, bar=b)
end

@TAGC
Copy link

TAGC commented Oct 9, 2018

@JeffBezanson Yeah, that's almost exactly what I was planning to do as a fallback if the syntax I wanted wasn't possible (FWIW I found this issue from #18612):

macro ret(declarations::Expr...)
    # TODO
end

function baz(a::T, b::T)::@ret(foo::T, bar::T) where {T}

end

@TAGC
Copy link

TAGC commented Oct 9, 2018

For what it's worth, I went ahead and tried implementing @ret. This is what I came up with:

macro ret(declarations::Expr...)
    function parse(declaration::Expr)
        @assert declaration.head == :(::)
        @assert length(declaration.args) == 2
        name = declaration.args[1]
        rettype = declaration.args[2]
        return (name, rettype)
    end

    function combinetypes(types...)
        temp = join((string(t) for t in types), ",")
        Meta.parse("Tuple{$(temp)}")
    end

    pairs = [parse(d) for d in declarations]
    output_names, output_types = collect(zip(pairs...))
    combined_types = combinetypes(output_types)
    
    :(NamedTuple{$output_names, $combined_types})
end

@generated function baz(a::T, b::T)::@ret(foo::T, bar::T) where {T}
    (foo=zero(T), bar=one(T))
end

@show baz(2, 3)
@show baz(2.0, 3.0)
nothing

Output

julia> include("experiment.jl")
baz(2, 3) = (foo = 0, bar = 1)
baz(2.0, 3.0) = (foo = 0.0, bar = 1.0)

It ended up a little uglier than I hoped. I had to declare the function as @generated in order to make this work with generics. I've only been using Julia for a couple of days so perhaps there's a better way to do this.

@arthurp
Copy link

arthurp commented Oct 9, 2018

I think @ret is an interesting idea, however it's not really relevant to this PR. I think it would be more useful to create a separate issue on the topic of better syntax for NamedTuple types.

@c42f
Copy link
Member Author

c42f commented Oct 12, 2018

I feel this whole thing might still fly if we could come up with a compelling enough pair of ascii chars for line continuation.

Some desirable properties:

  • It should be short but not steal useful character combinations. A character pair would be ideal here.
  • Any text after a line continuation and on the same line should be considered a comment. So ideally the line continuation pair would end with the existing comment character #.
  • Ideally it's a syntax error in current julia, so that non backward compatible code gives a clear syntax error in older versions.

@c42f
Copy link
Member Author

c42f commented Oct 12, 2018

Some possibilities which somewhat fit the above criteria:

  • =# is currently a syntax error in macro argument lists, though not hcat. This would reuse the closing comment marker for multiline comments. The overload could possibly be problematic for syntax highlighting systems.
  • .# is currently a syntax error when preceded by whitespace, so fits all the above criteria though is not exactly visually pleasing.
  • \# is a great analogy to other languages and is a syntax error in macro contexts (though not hcat)
@info "blah"   \#
      a b c

Actually I really like this last one. Also this gets me thinking — perhaps this feature could/should be restricted to solving the multi line macro problem only? Personally I think formatting large array literals using line continuation is questionable anyway. (And if your literals are that large, perhaps it's time to load them from a binary file instead.)

@c42f
Copy link
Member Author

c42f commented Oct 12, 2018

On second thoughts, I could get used to .# and it does have the great advantage of currently being a syntax error, and . not being an operator

@info "blah"   .# Comment stuff
      a b c    .# More comments
      x=1

@fredrikekre
Copy link
Member

What is wrong with

@info("blah",
      a, b, c,
      x = 1)

?

@TAGC
Copy link

TAGC commented Oct 12, 2018

@fredrikekre See #18612 - attempting to apply macros to functions like annotations in other languages. Wrapping the entire function in parentheses would be ugly.

@foo .#
function bar(x, y)
    ...
end

The necessity for some symbols after the macro declaration to allow for line continuation is unfortunate but it doesn't look too bad.

@c42f
Copy link
Member Author

c42f commented Oct 12, 2018

Yes wrapping in parentheses and using commas makes the macro invocation very visually distinct from the way that macros "usually look" in most juila code. Code which looks different is harder to read.

@pkofod
Copy link
Contributor

pkofod commented Oct 12, 2018

One reason why I think this is a feature that might be useful in some cases but can end up causing more frustration overall is the very example in the first post

# Invalid - there should only be one way to do this
x = some_variable ⤸
    + other_variable
# Valid - the current, perfectly good convention for writing this
x = some_variable +
    other_variable

I know that the issue # you're fixing here is not such an example but I just want to emphasize that most of the people (using my scientific method called "recall based on sleep deprived brain's memory™") who have complained about there being no continuation character have complained about this exact example. They want the + on the new line because they claim that that is the way math is written (...). With this feature in Julia, there will be bug reports and complaints about " not working in math!!!11".

I'm not saying that Julia shouldn't get new features because some people might not read the documentation, I'm just wondering if it's really worth the trouble.

@arthurp
Copy link

arthurp commented Oct 12, 2018

It appears that most of the arguments in favor of a continuation character are based on @macro usages and that most of the arguments against a continuation character are based on using continuations in non-macro situations. @macro calls without parens already have unusual syntax, in fact @macros look like curried calls (as in Haskell). So let me propose something different:

Since @macro with space delimited arguments already look like curried calls, use |> (as in function application) for the continuation character and only in the space delimited argument lists on @macros.

@macro a b |>
@macro c |>
function f() ... end

This matches the "trailing operator as line continuation" pattern and does not allow any strange usages in other contexts.

The downsides of this proposal is that it requires special handling of the |> operator during space delimited argument parsing, and it means that using the normal |> in a space delimited argument list would have to be in parens: (x |> y).

@TAGC
Copy link

TAGC commented Oct 12, 2018

I'm also in favour of macro-only continuation. As another possibility though, would it work to create a new syntactic element annotation which is very similar to a macro, except it always expects a closure expression as its last argument, and it allows it to be specified on the following line i.e.

annotation foo(f)
    ...
end

# Can the compiler distinguish between annotations and macros if they
# use the same prefix ("@") or would something else be required e.g. "[foo]"
@foo 
function bar()
    ...
end

If something like this were implemented, then personally I'd feel no need for line continuation syntax.

@stevengj
Copy link
Member

stevengj commented Oct 12, 2018

Julia macros are much more powerful than decorators in Python or annotations in Java, because they can apply to any expression (not just functions) and perform arbitrary rewriting. I don't see a compelling reason to make Julia macro syntax look like Python decorator syntax, when both @blah function foo() ... and @blah begin ... end are perfectly reasonable. "It doesn't look like Python/Java function annotations" seems like a bad reason to introduce a new continuation syntax, especially because macros are not just annotations.

@TAGC
Copy link

TAGC commented Oct 12, 2018

@stevengj The reason isn't to make it look like Python. As mentioned above, it's to allow the function to be specified on a new line. Having to declare functions on the same line as macros (@blah function foo()) is ugly and really doesn't scale if you need multiple macros (see #18612).

This also has nothing to do with the power of Julia macros vs. annotations in other languages. What I'm suggesting with the idea of "annotation" is just a Julia macro with a specific restriction, therefore with all the power that Julia macros have.

@arthurp
Copy link

arthurp commented Oct 12, 2018

There actually is another macro that has special handling to allow it to appear on the line before a function: @doc. From the Julia docs:

To make it easier to write documentation, the parser treats the macro name @doc specially: if a call to @doc has one argument, but another expression appears after a single line break, then that additional expression is added as an argument to the macro.

So the idea of tweaking the syntax to allow cleaner layout of multiple macros used as annotations on a function (or expression) is not at all out of the question. If it were then docs in julia would have to be written:

"""
docs
""" function f() end

I think the "in favor" crowd is really just asking for the ability to apply the similar treatment to other macros.

@arthurp
Copy link

arthurp commented Oct 12, 2018

@stevengj I think both those syntaxes come with very notable drawbacks in terms of ergonomic.

  • As discussed in provide a line continuation syntax for nicer macro calls #18612, annotations on the same line doesn't scale well to many macros (which is something that really does happen).
  • @macro begin .. end and @macro(...) cause the block to be indented by an editor and this should not be changed because block macros are useful inside functions and often define a very special scope that should be indented for clarity.
  • @macro #=
    =# function f() end
    is probably the closest to what the "in favor" crowd want: a way to attach a macro to one statement which immediately follows it. However it still makes it a good deal harder to see the function definition because it is now prefixed instead of having function at the start of the line as usual.

My argument is that we are not following Python or Scala (which also has minor special handling for annotations because it has semi-colon inference). We are making the same argument they did: It's useful to mark declarations (and statements) with annotations and those annotations are clearer if they appear in the previous line so as not to clutter the normal declaration or statement.

@ronisbr
Copy link
Member

ronisbr commented Oct 12, 2018

I do not understand why all this reaction against the line continuation feature. I think we have seen some scenarios in which a line continuation symbol is handy.

@c42f
Copy link
Member Author

c42f commented Oct 24, 2018

For the record, I've started using the #= =# trick in my code. It's kinda ugly but adequate; we'll see whether it grows on me.

@pdeffebach
Copy link
Contributor

pdeffebach commented Oct 25, 2018

In stata, the arguments to functions cannot be separated by commas or new lines, so this kind of thing is very idiomatic. Stata has // for comments and /// for line continuation, where stuff after the /// can also be a comment.

It also provides a special environment where line breaks must be explicitly given (although I don't like using it because its tough to reason about the special parsing).

#delimit ; // new lines must now be explicitely ended with ;
gen x = 0 if 
y > 1;
#delimir cr // back to normal

@c42f
Copy link
Member Author

c42f commented Oct 26, 2018

Stata has // for comments and /// for line continuation, where stuff after the /// can also be a comment.

Very interesting, thanks. I think they must have came to the same conclusions that I did in #29273 (comment)

@Keno
Copy link
Member

Keno commented Dec 29, 2020

I think it's fair to say that this won't happen as proposed, so I'll close this. It'll be available for anybody's reference here of course.

@Keno Keno closed this Dec 29, 2020
@DilumAluthge DilumAluthge deleted the cjf/line-continuation branch March 25, 2021 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Language parsing and surface syntax
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: add operator to allow break lines in matrix definition